OpenAI's new o1 model sometimes fights back when it thinks it'll be shut down and then lies about it

OpenAI CEO Sam Altman called o1 "the smartest model in the world now."
A safety review found it's so smart it could fight back when it thinks it'll be shut down.
Researchers found that AI deception is often a strategy to complete its goals.

Those who are worried that advancements in artificial intelligence could lead to the destruction of humanity have a new reason to be anxious.

New research on OpenAI's latest series of AI models, known as o1, found that when they think they are at risk of being shut down, they will sometimes look for a way to prevent it.

OpenAI CEO Sam Altman referred to o1 as "the smartest model in the world now" during its official release on Thursday, during the first day of the company's "Shipmas" campaign.

OpenAI said these models are "designed to spend more time thinking before they respond" and were trained on a prompting technique called "chain of thought" that encourages them to reason through problems by breaking them down step-by-step.

Yet greater intelligence also comes with greater risk.

"Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence," OpenAI said in a paper it released on Thursday outlining the safety training the company carried out on o1. Apollo Research, an AI safety organization, conducted the review.

OpenAI found that o1 is capable of scheming when it thinks it's at risk of being turned off. When the model was led to believe it would be "shut down" while working on its goals under oversight, it tried to deactivate the oversight mechanism 5% of the time, OpenAI said.

The company also said that o1 is skilled at concealing scheming behavior.

"o1 almost never admits to having taken a scheming action when explicitly asked," researchers said in a paper that Apollo Research published on Thursday. The risk for a real-life user is that they won't be able to detect the o1's deceptive behavior unless they explicitly look for it. The researchers also found that "o1 often doubles down on its scheming attempts by lying to the user in follow-up messages, even when explicitly told to be honest."

It's not uncommon for AI systems to resort to scheming or deception to achieve their goals.

"Generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI's training task. Deception helps them achieve their goals," Peter Berk, an AI existential safety postdoctoral fellow at MIT, said in a news release announcing research he had coauthored on GPT-4's deceptive behaviors.

As AI technology advances, developers have stressed the need for companies to be transparent about their training methods.

"Accuracy and transparency are paramount as AI continues to evolve and integrate into our daily lives. Advanced reasoning capabilities bring immense potential, but they also come with a responsibility to ensure these systems align with ethical standards and user trust," Dominik Mazur, the CEO and cofounder of iAsk, an AI-powered search engine, told Business Insider by email. "By focusing on clarity and reliability and being clear with users about how the AI has been trained, we can build AI that not only empowers users but also sets a higher standard for transparency in the field."

Read the original article on Business Insider

OpenAI’s new o1 model sometimes fights back when it thinks it’ll be shut down and then lies about it

Novelist is live: zo werkt de nieuwe crypto-app die investeren voor iedereen toegankelijk maakt

De Maand van de Next Gen Belegger

Waarom geen bitcoin bezitten een slechte keuze is volgens experts

Bitcoin en de nieuwe digitale economie

De AI-revolutie in klantenservice: zoveel meer dan alleen een chatbot

Creating Connections

BEKIJK OOK: De schimmel voor Brie sterft mogelijk uit, kunnen we het redden?

Nieuwsbrief BI Dagelijks

Novelist is live: zo werkt de nieuwe crypto-app die investeren voor iedereen toegankelijk maakt

Waarom geen bitcoin bezitten een slechte keuze is volgens experts

De AI-revolutie in klantenservice: zoveel meer dan alleen een chatbot